Predictive model for early detection of type 2 diabetes using patients' clinical symptoms, demographic features, and knowledge of diabetes

Abstract Background and Aims With the global rise in type 2 diabetes, predictive modeling has become crucial for early detection, particularly in populations with low routine medical checkup profiles. This study aimed to develop a predictive model for type 2 diabetes using health check‐up data focusing on clinical details, demographic features, biochemical markers, and diabetes knowledge. Methods Data from 444 Nigerian patients were collected and analysed. We used 80% of this data set for training, and the remaining 20% for testing. Multivariable penalized logistic regression was employed to predict the disease onset, incorporating waist‐hip ratio (WHR), triglycerides (TG), catalase, and atherogenic indices of plasma (AIP). Results The predictive model demonstrated high accuracy, with an area under the curve of 99% (95% CI = 97%–100%) for the training set and 94% (95% CI = 89%–99%) for the test set. Notably, an increase in WHR (adjusted odds ratio [AOR] = 70.35; 95% CI = 10.04–493.1, p‐value < 0.001) and elevated AIP (AOR = 4.55; 95% CI = 1.48–13.95, p‐value = 0.008) levels were significantly associated with a higher risk of type 2 diabetes, while higher catalase levels (AOR = 0.33; 95% CI = 0.22–0.49, p < 0.001) correlated with a decreased risk. In contrast, TG levels (AOR = 1.04; 95% CI = 0.40–2.71, p‐value = 0.94) were not associated with the disease. Conclusion This study emphasizes the importance of using distinct clinical and biochemical markers for early type 2 diabetes detection in Nigeria, reflecting global trends in diabetes modeling, and highlighting the need for context‐specific methods. The development of a web application based on these results aims to facilitate the early identification of individuals at risk, potentially reducing health complications, and improving diabetes management strategies in diverse settings.


| INTRODUCTION
5][6] However, this clinical stage is often overlooked because it is usually asymptomatic in the affected individuals. 6Several other predisposing factors have been linked to the onset of type 2 diabetes, including regular consumption of foods with a high glycaemic index, 7 obesity, high-density lipoprotein (HDL) and low-density lipoprotein (LDL) levels, and hip and waist circumference measurements. 810] Diabetes is a chronic condition that affects millions of people globally and is projected to be one of the leading causes of noncommunicable mortality by 2030. 11The global burden and epidemiological trends of type 2 diabetes mellitus have been extensively documented, highlighting its growing impact on public health systems worldwide.Khan et al. 12 provided a comprehensive overview of the global epidemiology of type 2 diabetes, emphasizing its increasing prevalence and the urgent need for effective management strategies.This perspective is particularly relevant in the context of recent health crises, such as the COVID-19 pandemic, where Huang et al. 13 identified a significantly increased risk of severe outcomes and mortality among hospitalized patients with diabetes in Mexico.These findings emphasize the critical need for effective diabetes management and preventive strategies.Furthermore, the complexity of type 2 diabetes as a multifactorial disease has been well articulated by Chatterjee et al. 14 who underscored the intricate interplay of genetic, environmental, and lifestyle factors in its pathogenesis and progression.
Unnikrishnan et al. 15 also shed light on the diabetes epidemic in India, underscoring the high prevalence of complications and pressing the need for improved healthcare responses.Type 2 diabetes not only has immediate health implications, but also exerts significant economic strain globally.A systematic review by Seuring et al., 16 elucidated the extensive economic costs associated with type 2 diabetes, including both direct medical costs and indirect costs such as lost productivity due to disease-related morbidity and mortality.
These studies provide a comprehensive picture of the global challenges posed by type 2 diabetes and reinforce the urgency for research efforts, such as the present study, which aims to enhance early detection and intervention, mitigate complications, and ultimately alleviate the global burden of this pervasive condition.
Nigeria is the most populous country in Africa and has a large and increasing burden of diabetes, particularly type 2 diabetes mellitus. 17However, since 1992, when a prevalence of 2.2% has been reported, no national health survey has been conducted to determine the prevalence and risk factors of diabetes in the country. 18This lack of current information on diabetes in Nigeria has hindered efforts to effectively manage the disease.Despite efforts to determine the prevalence and risk factors of diabetes in Nigeria, there are significant gaps in the country's management of the disease. 19The Diabetes Association of Nigeria and the Endocrine and Metabolism Society of Nigeria are responsible for developing diabetes management guidelines.However, there remain unanswered research questions and practical gaps in the country's diabetes management practices. 19Additionally, the sociocultural context in Nigeria influences healthcare providers' practices regarding self-management support; however, this aspect has not been fully explored. 20spite the global prevalence of type 2 diabetes, many individuals remain undiagnosed until complications arise.The prevention of acute problems and reduction in the risk of longterm complications rely on ongoing patient awareness, early diagnosis, and self-management.Substantial evidence supports the use of various therapies to improve outcomes. 21Health check-ups are crucial for health management because of increased health awareness. 21These examinations provide crucial information for disease diagnosis and patient care.Regular examinations by physicians provide the opportunity for early intervention and can help detect risk factors for chronic conditions such as type 2 diabetes.
However, different individuals have varying self-care and routine medical checkup habits, and some populations have low annual health checkup compliance. 22This emphasizes the need for advanced predictive and cutting-edge models to facilitate early detection and targeted intervention strategies, thereby mitigating the global impact of this chronic condition.
Recent advancements in diabetes prediction have introduced various models, each contributing uniquely to the field.Studies have employed diverse methodologies ranging from neural networks, as seen in multilayer and probabilistic models, 23 to machine learning techniques such as the hybrid-twin support vector machine (SVM). 24Other approaches include categorizing treatment plans using J48 classifiers, 25 developing diagnostic tools that combine fuzzy logic, neural networks, and case-based reasoning. 26nd applying hybrid models such as kernel SVM for high-accuracy diagnosis 27 ; notably, some studies focused on lifestyle-related risk prediction using the PIMA Indian diabetes data set, 28 while others such as Jahani et al. 29 and Hashi et al. 30 emphasized neural network-based models for disease onset and progression.Additionally, innovative techniques such as controlled binning and multiple regression.31 and noninvasive glucose estimation using an elastic net model. 32further illustrate the diverse range of predictive strategies being explored in diabetes research.Alix et al. 33 developed a predictive model for type 2 diabetes using clinical and demographic parameters.Lai et al. 34 developed a predictive model to identify Canadian patients at risk of diabetes using demographic data and laboratory results from medical visits. Furthrmore, racial disparities were examined to assess the effectiveness of risk prediction models for incident type 2 diabetes.35 In line with previous diabetes research, our study adopted a comprehensive multivariate framework. Ths approach was designed to integrate a broader spectrum of variables, including clinical symptoms, demographic characteristics, patients' knowledge of diabetes, and biochemical data, into a predictive model.Such an integration aligns with the recent shift in diabetes research towards more sophisticated, data-intensive models that aim to capture the multifaceted nature of the disease.The necessity of this approach has been underscored in recent literature, with studies by Collin et al., 36 Fregoso-Aparisio et al., 37 Eyiji et al., 38 and Tuppad et al. 39 highlighting the importance of incorporating multiple risk factors into predictive models for diabetes.Furthermore, recognizing the diverse manifestations of type 2 diabetes across different populations, our study specifically focused on the Nigerian context. Ths study addresses a notable gap in the literature that has predominantly concentrated on Western populations.Uloko et al., 18 Okoro et al., 40 Chinenye et al., 18 and Fasanmade et al. 41 highlighted the unique epidemiological and clinical characteristics of type 2 diabetes in African populations, underscoring the need for predictive models tailored to these specific demographic profiles. Ou study's application of a predictive model in Nigeria not only contributes to a more global understanding of type 2 diabetes but also demonstrates the adaptability of such models to varied settings.
Hence, this study utilized health checkup data from patients at a Nigerian diabetes clinic and incorporated a range of indicators to predict the onset of diabetes.By integrating clinical symptoms, demographic features, and patients' knowledge of diabetes, we aimed to enhance the early detection and management of this condition.Our approach aligns with the increasing use of machine learning models in medical research, offering new insights into the complex interplay between the factors leading to type 2 diabetes.Such an endeavor is crucial for the early identification of high-risk individuals.By identifying patients at high risk of developing type 2 diabetes early, healthcare professionals can provide personalized education and care to prevent complications and improve outcomes, ultimately leading to better overall health and well-being worldwide.

| Study area, design, and participants
This hospital-based case-control study, conducted from October 2018 to March 2021, included patients diagnosed with type 2 diabetes mellitus using a convenience sampling technique. 42A total of 444 participants were selected based on accessibility and willingness to participate.The data included 43 characteristics, such as clinical details, demographic information, and knowledge and attitudes toward diabetes.Type 2 diabetes mellitus was diagnosed by the attending physician using the American Diabetes Association Criteria, [4][5][6] with an FBS level of 126 mg/dL or higher. 21The control group underwent a health examination and was confirmed to be free of diabetes based on FBS levels and a well-calibrated Accu-Chek glucometer strip confirmatory laboratory report of fasting blood sugar and glycated hemoglobin.All the participants completed a structured questionnaire that captured their demographic information.

| Inclusion and exclusion criteria
6]21 Obesity was defined as a BMI greater than 30 kg/m 2 . 43,44Pregnant women, those with persistent alcoholism, and those with a history of hepatitis were excluded. 43,44

| Outcomes
The endpoint of this study was to identify patients with type 2 diabetes mellitus.Patients with FBS levels ≥126 mg/dL were grouped into those with confirmed diabetes, and those below 126 mg/dL were labeled as nondiabetic.

| Features
Several potential biomarkers of type II diabetes have been identified, including biochemical and clinical parameters, 45 demographic characteristics, and patients' knowledge and attitudes towards the disease. 46The biochemical and clinical parameters used in the study include apolipoprotein C-III (APO-CIII), systolic blood pressure (SysBp), diastolic blood pressure (DiaBp), hypertensive status, hypertensive group, waist circumference (WC), Hip circumference (HC), waist-hip ratio (WHR), total cholesterol (TC), triglyceride (TG), high-density lipoprotein cholesterol (HDL-c), low-density lipoprotein cholesterol (LDL-c), atherogenic indices of plasma (AIP), cardiac risk ratio (CRR), non-high-density lipoprotein cholesterol (Non-HDL-c), atherogenic coefficient, malondialdehyde (MDA), superoxide dismutase (SOD), catalase, body mass index (BMI), carbohydrate counting (CHO), hemoglobin A1C (HbA1C), retinopathy, nephropathy, feet neuropathy, heart attack, slowed digestion, gastroparesis and hypertensive status.Demographic features included the sex and age of the patients, and the study also considered questions about the knowledge and attitudes of patients towards diabetes, as well as other factors that have been reported in the literature. 46,47Details regarding the measurements of the biochemical and clinical parameters used in the study and their units are provided in Supplementary Table 1.

| Model cross-validation
Nested cross-validation was performed on the training data set using two levels of stratified cross-validation involving inner and outer folds to obtain good classification accuracy and prevent overfitting.
The model parameters were optimized, and informative feature subsets were determined in the inner folds, while the best (inner) model performance was assessed in the outer fold.For the outer fold, the training data set was split into 10-fold cross-validation (CV), with one-fold kept as a test set, and the remaining nine folds were split into ten stratified folds, nine folds for model training, and the remaining fold for the test set, to provide an unbiased evaluation of the model fit on the inner training set while tuning the model's hyperparameters and selecting optimal features.Twenty repetitions of the outer and inner folds were performed to obtain a robust model; the outer and inner folds were also stratified to correct the imbalance in the data set.

| Optimal feature selection and hyperparameters
Sequential backward selection was employed for feature selection, starting with the utilization of all features and eliminating noninformative features in each iteration to enhance the performance of the model.This process was continued until no further improvements were observed.Once the optimal combination of hyperparameter and feature subsets was identified to maximize the performance metrics in the test set, the model was retrained on the outer training set and tested on the test set from the outer CV.Subsequently, the feature subsets from all outer folds were combined using a voting strategy that retained features with a frequency of more than 50% in all outer folds as informative; these features were chosen as the final feature subset.The median of the best hyperparameters from the outer CV folds was used to fit the final model.Finally, summary performance estimates were generated by averaging the area under the curve (AUC) of the receiver operating characteristic (ROC) curve.

| Patient characteristics
The training set included 312 individuals, including 172 and 140 with and without type 2 diabetes, respectively.There was no significant difference in the prevalence of obesity between the participants with and without diabetes (48.5% vs. 54.4%,respectively; pvalue = 0.36).However, a marked difference was observed in the use of diabetes medication, with 100% of participants without diabetes not on medication compared with only 6% of participants with diabetes not on medication (p-value < 0.001).The median age of the patients was 40 years (Table 1).There was no significant difference in the age of the participants with and without diabetes Female patients with diabetes had a significantly higher incidence of diabetes than male patients (84.6% vs. 15.4%,p-value = 0.008).
Further analysis revealed that patients with diabetes had significantly higher levels of various biomarkers.For instance, the levels of AIP, CRR, non-HDL-c, atherogenic coefficient, MDA, WHR, CHO, TG, and APO-CIII were all significantly elevated in patients with diabetes (p-value < 0.001 for AIP, non-HDL, MDA, WHR, CHO, and TG; p-value = 0.002 for CRR and atherogenic coefficient; pvalue = 0.02 for APO-CIII).Conversely, the antioxidant enzymes SOD and catalase were found to be significantly lower in these patients (p-value < 0.001 for both).Additionally, a higher proportion of individuals with diabetes were found to be taking glucose-lowering medications (100% of diabetic participants vs. 6% of nondiabetic participants; p-value < 0.001).In contrast, our analysis did not reveal significant associations between diabetes status and a range of clinical features such as body weight, height, BMI, SysBP, DiaBP, hypertension status, WC, HC, HDL-c, LDL-c, HbAlC, retinopathy, nerve or foot neuropathy, heart attack, or slow digestion.
Regarding beliefs about diabetes risk factors, individuals who did not acknowledge the roles of obesity, overeating, sugar-and fat-rich diets, insulin resistance, and heredity in the development of diabetes exhibited a higher incidence of diabetes.However, these differences were not statistically significant (p-value = 0.30 for obesity, overeating, sugar-and fat-rich diets; p-value = 0.50 for insulin resistance; p-value = 0.47 for heredity).Similarly, the belief that medication is more important than diet and exercise for diabetes control was not significantly associated with the incidence of diabetes (p-value = 0.24).Additionally, patients with diabetes were slightly more inclined to disbelieve that smoking and alcohol consumption contribute to disease complications, although this association approached significance (p-value = 0.07).Finally, the baseline characteristics of the patients in the training and test sets were similar (Table 2).

| Univariate analysis
We first conducted a univariable analysis on the training set to investigate the association between each factor and diabetic status (Table 2).

| Multivariable analysis
Factors for which the univariate odds ratio was statistically significant were included as inputs in the multivariable logistic regression model.
From these variables, the final predictive model was built using nested cross-validation, and included WHR, catalase, TG, AIP, and SOD as informative features (Figure 1), with training and test AUC values of 99% (95% CI = 97%-100%) and 94% (95% CI = 89%-99%), respectively (Figure 2).The model training and test set accuracies were 95% and 91%, respectively.The sensitivity, specificity, PPV, and NPV of the model for the training and test datasets are presented in Table 3.These associations are consistent with previous findings, 48,49 confirming an association between type 2 diabetes and these clinical symptoms.

| Association between positive diabetes and model-selected predictors
1][52][53] This relationship is supported by the fact that central obesity, as measured by WC or WHR, produces diabetogenic substances that contribute to progression of diabetes. 52Seidell et al. 50posited that the ratio of waist circumference to hip circumference is a significant predictor of the prevalence of diabetes mellitus in adult men and women, and abdominal computed tomography (CT) scan measurements of subcutaneous fat were less significantly associated with the accumulation of intra-abdominal (visceral) fat than the waist-to-hip circumference ratio.Therefore, it is important for patients to maintain a normal waist circumference and balanced waist-to-hip ratio to reduce their risk of developing type 2 diabetes mellitus.
5][56] Type 2 diabetes is often accompanied by dyslipidaemia and abnormal accumulation of lipids in the bloodstream. 57Elevated levels of triglycerides, APO-CIII, non-HDL cholesterol, and atherogenic coefficient are indicative of dyslipidaemia.While a genetic predisposition has been linked to the risk of type 2 diabetes, 57,58 previous studies have also associated this condition with metabolic indicators, such as CHO, AIP, and CRR. 59ditionally, FBS, blood pressure, and lipid profiles, including CHO, AIP, and CRR, have been correlated with the Indian Diabetes Risk Score (IDRS). 60Type 2 diabetes is frequently characterized by high CHO and AIP levels, which increases the risk of cardiovascular disease. 61ditionally, the associations between the risk of type 2 diabetes, MDA, SOD, and catalase levels were consistent with those reported in a previous study. 62Hyperglycemia, a defining feature of oxidative stress and MDA diabetes, is a leading cause of type II diabetes. 63The development of diabetic complications such as cardiovascular disease (CVD) is heavily influenced by oxidative stress. 64In addition, the relationship between type 2 diabetes and SOD activity has been examined in several studies.SOD is a vital antioxidant enzyme that protects cells against oxidative stress. 65tients with type 2 diabetes have been shown to have lower SOD activity, 65 a sign of higher oxidative stress, and catalase overexpression lowers the expression of angiotensinogen and apoptosis in diabetic mice. 66Oxidative stress is a major contributor to diabetes complications such as CVD.Reducing oxidative stress and maintaining SOD, catalase, and MDA activities may be crucial components of type 2 diabetes care to reduce the risk of diabetic complications.
Numerous studies have demonstrated that high FBS levels are necessary for transition from a healthy state to diabetes mellitus. 67,68boah et al. 69 claimed that FBS is a distinct risk factor for the  In addition to the established factors for diabetes prediction, our study incorporated an assessment of patients' knowledge and attitudes towards the disease.Our findings revealed a general deficit in the understanding of the risk factors associated with type 2 diabetes among the study participants.This lack of awareness may contribute to the prevalence of the disease, particularly in developing nations, where attitudes towards diabetes and its symptoms tend to be more dismissive.Interestingly, our analysis revealed that individuals who did not recognize the contribution of factors such as obesity, overeating, sugar-and fat-rich diets, insulin resistance, and heredity to the development of diabetes exhibited a higher incidence of the disease.However, these associations were not statistically significant, suggesting a potential gap in the awareness or even denial of these risk factors among individuals with diabetes.In light of these findings, we recommend re-evaluation of strategies to combat diabetes, with a primary focus on promoting awareness of its risk factors.Educating the public and fostering appropriate attitudes towards diabetes prevention are essential steps towards maintaining a diabetes-free society.
Additionally, the belief that medication is more important than diet for controlling diabetes did not show a significant association with the incidence of diabetes.This finding underscores the complexity of diabetes management and the need for comprehensive patient education.Unexpectedly, our analysis indicated that individuals taking glucose-lowering medications had a higher risk of developing type 2 diabetes.This finding may initially seem counterintuitive as these medications are typically prescribed for diabetes management.However, it is important to consider that many patients may not commence medication until disease progression, which could reflect a more advanced stage of diabetes at the time of diagnosis.This underscores the crucial role of early detection and intervention in the management of type 2 diabetes.Further research is needed to understand the causal relationships and the underlying mechanisms of this association.Our data also suggest a slightly higher tendency among patients with diabetes to not believe that smoking and alcohol consumption contribute to diabetic complications.This further highlights the importance of patient education on lifestyle factors in diabetes management.
Another important aspect of our study was the use of machine learning techniques to predict the occurrence of diabetes using routinely collected data.By utilizing multivariate penalized logistic regression implemented under nested cross-validation with sequential backward feature selection, our model maximized its predictive power while minimizing the risk of overfitting.This datadriven approach facilitates the identification of key predictors of type | 13 of 16 establishing causality.Additionally, reliance on structured questionnaires may have introduced recall bias due to potential misreporting by the participants.Exclusion of specific groups, such as pregnant women and those with persistent alcoholism or a history of hepatitis, could limit the applicability of our findings to these segments.Relying solely on FBS levels for diabetes diagnosis, despite its reliability, may overlook certain borderline or alternative diagnostic cases.Finally, potential unmeasured confounders and variability in biochemical and clinical measurements may have influenced the study outcomes.
Notwithstanding these limitations, our findings provide a basis for further comprehensive investigation on this topic.
In conclusion, we have successfully developed a highly accurate predictive model that can aid in the early identification and management of diabetes.The AUC obtained in our study was better than 88% reported by Walford et al. 70 In addition, our predictive model demonstrated robust performance with training and test set accuracies of 95% and 91%, respectively, showing a notable advantage in terms of accuracy and reliability compared to similar studies in the field.For instance, while the model in Anand et al. 28 achieved a commendable 75% accuracy using the PIMA data set and CART classifier, our model surpassed this, indicating a higher predictive precision.Similarly, Jakhmola et al. 31 and Anand et al. 28 reported accuracies of 77.85% and 75%, respectively, which were significantly lower than the performance of our model.Notably, our model even outperformed the 78.17% accuracy achieved by the decision tree model 71 and the 70.8% accuracy of the treatment classification model. 25Moreover, while the expert healthcare system 30 achieved a high accuracy of 90.43%, our model slightly edged this out, particularly in the training phase.The elastic net model of Zanon et al. 32 also fell short of the accuracy of our model, underlining the effectiveness of our methodology.These comparisons underscore the high predictive capability of our model, which could be attributed to its comprehensive analytical approach, potentially making it a more reliable tool for early detection and management of diabetes.
Our study demonstrated a significant association between various clinical symptoms, demographic features, and patients' knowledge of diabetes in predicting the onset of the disease.Overall, our findings indicate a good predictive performance for type 2 diabetes and suggest that incorporating clinical symptoms, demographic features, and knowledge of diabetes can improve the accuracy of predictive models for type 2 diabetes.Therefore, it is imperative to raise awareness of diabetes risk factors, promote healthy lifestyles, and emphasize the importance of early diagnosis and treatment.Integrating these approaches into public health campaigns can help mitigate the prevalence of diabetes and its complications.In future studies, we intend to expand the validation of our model to encompass a more diverse range of populations.This, however, is dependent on obtaining datasets that accurately represent these groups.Through this endeavor, we aimed to strengthen the generalizability and widespread applicability of our diabetes detection model.
This study included 444 patients, with 312 and 132 in the training and test sets, respectively.Patients' baseline characteristics were summarized using frequencies and proportions for categorical variables and medians and ranges for continuous variables.The study compared baseline characteristics between patients with and without type 2 diabetes using the Wilcoxon rank-sum test for continuous variables and Pearson's chi-square test for categorical variables, with Yates' continuity correction when appropriate.Continuous variables in the training set were scaled to have a mean of zero and standard deviation of one.The variables of the test set were mapped to the relevant variables in the training cohort.Univariate logistic regression models were used to assess the relationship between each component and type of diabetes as well as the association between the risk of type 2 diabetes and the patients' clinical, demographic, and knowledge of diabetes.A multivariate penalized logistic model implemented under stratified nested cross-validation for parameter optimization and sequential backward feature selection was used to predict the risk of type 2 diabetes.
To generate summary performance estimates, we averaged the AUC of the ROC curve and other performance evaluations such as sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of the CV.The sensitivity ( ), where TP, FP, TN, and FN are the numbers of true positives, false positives, true negatives, and false negatives, respectively, were calculated using the default cutoff value (0.5) for the positive or negative diabetic class.Model parameter values were chosen to maximize the predicted positive class.All statistical analyses were performed in R, and the final model was developed using the caret library (version 6.0.93).The final model's receiver operating characteristic (ROC) curve was drawn using the pROC library (version 1.18.0).Statistical significance was set at p-value < 0.05.

F
I G U R E 1 Important features of the penalized logistic regression model.OJURONGBE ET AL. | 11 of 16

F I G U R E 2
ROC curves from the penalized logistic regression model for the training (green) and test (blue) sets.T A B L E 3 Performance of the penalized logistic regression model for training and test sets.

Table 4
Patients clinical, demographical and knowledge of diabetes characteristics in the training set.Odd ratios (AOR) from the univariable penalized logistic regression model.
5.4 | Web-based application for the prediction of type II diabetesOur study presents a web-based application that uses the final multivariable model to enable the early prediction of type 2 diabetes.Available at https://iv3p9h-nurudeen-adegoke.shinyapps.io/Diabetic/,TABL E 1 facilitate targeted interventions and lifestyle modifications to prevent or delay the onset of diabetes.Our app was validated and achieved high accuracy in predicting diabetes risk, making it a valuable tool for managing diabetes risk for both individuals and healthcare professionals.6 | DISCUSSIONWe used routinely collected sociodemographic data, clinical symptoms, and patients' knowledge of diabetes to estimate the risk of developing diabetes.Our predictive model, based on multivariable penalized logistic regression, achieved an AUC of 99% (95% CI = 97%-100%) for the training set and 94% (95% CI = 89%-99%)T A B L E 1 (Continued) 2 diabetes and provides a more precise prediction of diabetes risk at the individual level.Although the application of machine learning techniques in this context is not novel, it demonstrates the potential utilized in our research may have resulted in selection bias, thereby restricting the generalizability of our findings beyond the study population.The study was conducted at a single hospital in Osogbo, Osun State, Nigeria and may not be representative of other geographical or sociocultural contexts.Although effective in elucidating associations, a case-control design is inadequate for OJURONGBE ET AL.